- Cooler than parking tickets
- Access to ~31,000 observations, over 5 years!
- Have you seen The Wire?
2/4/2018
library(sf) %>% tidyverse for spatial data explorationlibrary(spdep) and glm()library(geojsonio) # get ODP data library(tidyverse) # duh library(magrittr) # %<>% life library(viridis) # it's pretty library(sf) # the new kid library(spdep) # the grandaddy
Top 3 address for drug crime and not drug crime
arrange(crime_counts, -n) %>% group_by(drug_flag) %>% slice(1:3)
## # A tibble: 6 x 3 ## # Groups: drug_flag [2] ## address drug_flag n ## <chr> <chr> <int> ## 1 600 E MARKET ST Charlottesville VA drugs 410 ## 2 400 GARRETT ST Charlottesville VA drugs 38 ## 3 700 PROSPECT AVE Charlottesville VA drugs 38 ## 4 600 E MARKET ST Charlottesville VA not_drugs 635 ## 5 700 PROSPECT AVE Charlottesville VA not_drugs 412 ## 6 1100 5TH ST SW Charlottesville VA not_drugs 341
The police station's address is 606 E Market Street….
"The answer is quite simple - when individuals walk in to the police department to file a report the physical address of the department (606 E Market Street) is often used in that initial report if no other known address is available at the time. This is especially true for incidents of found or lost property near the downtown mall where there is no true known incident location. The same is true for any warrant services that result in a police report occurring at the police department." - CPD
station_props <- arrange(crime_counts, -n) %>%
group_by(drug_flag) %>%
add_count(wt = n) %>%
slice(1)
with(station_props, prop.test(n, nn)) %>% tidy
## estimate1 estimate2 statistic p.value parameter conf.low conf.high ## 1 0.2221018 0.02175626 2135.459 0 1 0.1810225 0.2196687 ## method ## 1 2-sample test for equality of proportions with continuity correction ## alternative ## 1 two.sided
No, they are not.
Census blocks make a lot of sense because:
library(tidycensus)long_url <- "https://opendata.arcgis.com/datasets/e60c072dbb734454a849d21d3814cc5a_14.geojson"
census <- geojsonio::geojson_read(long_url, what = "sp") %>%
st_as_sf()
ggplot(census, aes(fill = HU_Vacant / Housing_Units)) +
geom_sf() + scale_fill_viridis()
crime <- read_csv("https://github.com/NathanCDay/cville_crime/raw/master/crime_geocode.csv")
crime %<>% filter(complete.cases(.))
crime %<>% filter(address != "600 E MARKET ST Charlottesville VA")
sf, with same Coordinate Reference System (critical)crime %<>% st_as_sf(coords = c("lon", "lat"), crs = st_crs(census))
sf::st_within() and friendscrime %<>% mutate(within = st_within(crime, census) %>% as.numeric) %>%
filter(!is.na(within))
There are bunch of other great st_x(sf_a, sf_b) functions too. If you want to do it, there's a tool for it.
crime %<>% mutate(drug_flag = ifelse(grepl("drug", Offense, ignore.case = TRUE),
"drugs", "not_drugs"))
tidyversecrime_block <- st_set_geometry(crime, NULL) %>% # remove geometry for spread() to work
group_by(within, drug_flag) %>%
count() %>%
spread(drug_flag, n) %>%
mutate(frac_drugs = drugs / sum(drugs + not_drugs)) %>%
ungroup() # geom_sf doesn't care for grouped dfs/tbls
census %<>% inner_join(crime_block, by = c("OBJECTID" = "within"))
ggplot(census, aes(fill = frac_drugs)) +
geom_sf() + scale_fill_viridis()
Test with Moran's I statistic
## statistic p.value parameter method ## 1 0.2129707 0.021 979 Monte-Carlo simulation of Moran I ## alternative ## 1 greater
Are there other community metrics that are correlated?
Median income data comes from the American Community Survey via library(tidycensus) to supplement housing and demographics from the original Census data from ODP.
glm() to fit the highly correlated predictors simultaneously.mod <- glm(frac_drugs ~ frac_black + income,
data = census, family = quasibinomial())
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -3.171125e+00 2.468247e-01 -12.847681 1.329019e-14 ## frac_black 1.289115e+00 3.746045e-01 3.441270 1.551671e-03 ## income -4.334410e-06 3.099390e-06 -1.398472 1.710270e-01
The proportion of the population that is black is significant, but median income is not.
## statistic p.value parameter method ## 1 -0.07945691 0.694 306 Monte-Carlo simulation of Moran I ## alternative ## 1 greater
Does drug enforcement target black communities?
More steps:
Get data about police patrol locations/frquency
Dig deeper on the crime reporting procedure
How many of these "drug" crimes are low-level offenses
Add temporal elements to the model i.e. seasonal, time of day